Sesame Releases CSM Model: Real-time Emotion-Customized AI Speech Synthesis Reaches New Heights
On March 13th, Sesame unveiled its latest speech synthesis model, CSM, attracting significant industry attention. According to the official introduction, CSM adopts an end-to-end Transformer-based multimodal learning architecture. It understands contextual information to generate natural and emotionally rich speech with stunningly realistic sound. The model supports real-time speech generation, processing both text and audio inputs. Users can also control features such as tone, intonation, rhythm, and emotion by adjusting parameters, showcasing high flexibility. CSM is considered a breakthrough in AI speech technology.